第三課導論：解決非線性分類問題

我們正在超越線性模型的局限，這些模型在處理無法以直線分離的資料時會遇到困難。今天，我們將運用 PyTorch 流程來建立一個深度神經網路（DNN），能夠學習複雜且非線性的決策邊界，這對於現實世界的分類任務至關重要。

1. 可視化非線性資料的必要性

我們的第一步是建立一個具有挑戰性的合成資料集，例如兩月形分布，以直觀地展示為何簡單的線性模型會失敗。此設定迫使我們使用深度架構來近似分隔各類別所需的複雜曲線。

非線性激活函數的力量

深度神經網路的核心原理，在於透過像 ReLU之類的函數，在隱藏層中引入非線性。若無此機制，無論層數多深，堆疊層次最終僅等同於一個大型線性模型。

TERMINALbash — classification-env

> Ready. Click "Run" to execute.

TENSOR INSPECTOR Live

Run code to inspect active tensors

Question 1

What is the primary purpose of the ReLU activation function in a hidden layer?

Introduce non-linearity so deep architectures can model curves

Speed up matrix multiplication

Ensure the output remains between 0 and 1

Normalize the layer output to a mean of zero

Question 2

Which activation function is required in the output layer for a binary classification task?

Sigmoid

Softmax

ReLU

Question 3

Which loss function corresponds directly to a binary classification problem using a Sigmoid output?

Binary Cross Entropy Loss (BCE)

Mean Squared Error (MSE)

Cross Entropy Loss

Challenge: Designing the Core Architecture

Integrating architectural components for non-linear learning.

You must build a nn.Module for the two-moons task. Input features: 2. Output classes: 1 (probability).

Step 1

Describe the flow of computation for a single hidden layer in this DNN.

Solution:
Input $\to$ Linear Layer (Weight Matrix) $\to$ ReLU Activation $\to$ Output to Next Layer.

Step 2

What must the final layer size be if the input shape is $(N, 2)$ and we use BCE loss?

Solution:
The output layer must have size $(N, 1)$ to produce a single probability score per sample, matching the label shape.